570 research outputs found

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Identify and Handling of Risk Analysis by Parallelism

    Get PDF
    Software industry have its own values in the economic growth of a country. The scenario of this field in the India is also not a small part of the whole. As per the trends India shares almost 20 % of the world2019;s software economy. The tactics and development processes for the software industries always requires a certain amount of improvements and certainty to achieve the target. There can be multiple factors which can affect the development process. In this research paper I am showing a much critical problem of risk handling there resolution. There are various models and methods which we can follow for calculate statically but that is not enough. If we really want to optimize the results as well as the success certainty we should improvise our traditional procedures. In the later part of this paper a short improvisation of my side over the traditional one is shown. This method improves our predictability about risks as well as the rectification of that problems

    A Multi-Phase Flow Model Incorporated with Population Balance Equation in a Meshfree Framework

    Get PDF
    This study deals with the numerical solution of a meshfree coupled model of Computational Fluid Dynamics (CFD) and Population Balance Equation (PBE) for liquid-liquid extraction columns. In modeling the coupled hydrodynamics and mass transfer in liquid extraction columns one encounters multidimensional population balance equation that could not be fully resolved numerically within a reasonable time necessary for steady state or dynamic simulations. For this reason, there is an obvious need for a new liquid extraction model that captures all the essential physical phenomena and still tractable from computational point of view. This thesis discusses a new model which focuses on discretization of the external (spatial) and internal coordinates such that the computational time is drastically reduced. For the internal coordinates, the concept of the multi-primary particle method; as a special case of the Sectional Quadrature Method of Moments (SQMOM) is used to represent the droplet internal properties. This model is capable of conserving the most important integral properties of the distribution; namely: the total number, solute and volume concentrations and reduces the computational time when compared to the classical finite difference methods, which require many grid points to conserve the desired physical quantities. On the other hand, due to the discrete nature of the dispersed phase, a meshfree Lagrangian particle method is used to discretize the spatial domain (extraction column height) using the Finite Pointset Method (FPM). This method avoids the extremely difficult convective term discretization using the classical finite volume methods, which require a lot of grid points to capture the moving fronts propagating along column height

    YK- 2038 Bug

    Get PDF
    This paper talks about the problem we all will be facing precisely   on 19th Jan.   2038.Digital world has been vulnerable by several bugs however, solely a couple of looked as if it would cause a good danger. The bug that got most famed was Y2K. Somehow, we tend to got over it, then, there was Y2K10. It too was resolved and currently we tend to have Y2K38. The Y2K38 bug, if not resolved, will make sure that the predictions that were created for the Y2K bug come true this point. On 19th Jan. 2038 within all UNIX supported Operating system Time and Date would began to work incorrectly. This    produces a big problem which is critical for all Software’s and basically a threat   to embedded systems. Hence afterwards   it   basically shows possible solutions   which we can take to avoid Unix Millennium 2038 problem. By doing research I have developed a patch. Now I have moved this    patch to GitHub. This patch can enable the 32bit S/W to use   64bit   time_t   on   a 32bit computer.  This  patch  is still under development but it is  current  state  has been shown in this paper along with  all  the  codes  and brief description of each and every function and files. I think to  quickies  way to solve this problem is by using patch which  Software  developers   who   are having 32bit S/w simply  needs  to  include  all  header   files   and implementation files within their software and use  the function I along with other open source community  has   made. Coding has been done in C language
    corecore